Skip to content

fix(mcp): prevent leaked semaphore warnings during MCPServerStdio cleanup#2802

Draft
4444J99 wants to merge 2 commits intoopenai:mainfrom
4444J99:fix/mcp-stdio-semaphore-leak
Draft

fix(mcp): prevent leaked semaphore warnings during MCPServerStdio cleanup#2802
4444J99 wants to merge 2 commits intoopenai:mainfrom
4444J99:fix/mcp-stdio-semaphore-leak

Conversation

@4444J99
Copy link
Copy Markdown

@4444J99 4444J99 commented Mar 28, 2026

Summary

Fixes the "leaked semaphore objects to clean up at shutdown" warning that occurs when using MCPServerStdio with MCP servers that use Python multiprocessing internally.

Root cause: During cleanup(), the exit_stack.aclose() unwinds contexts in LIFO order — ClientSession.__aexit__ runs first (cancelling its internal task group), then the transport's __aexit__ runs. Because the ClientSession cancellation kills the stdout_reader/stdin_writer tasks inside the transport, the transport's finally block — which closes stdin and waits for the subprocess — may never execute cleanly. The subprocess is killed without a chance to flush its multiprocessing resources (semaphores, shared memory, etc.), triggering Python's resource_tracker warning.

Fix: Before calling exit_stack.aclose(), explicitly close the session's write stream. This delivers EOF to the subprocess's stdin, giving it a window to release its resources and exit cleanly before the task groups are cancelled. Also reinitialise exit_stack and clear server_initialize_result during cleanup so the same server instance can reconnect.

Test plan

  • Added test_cleanup_resets_state_for_reconnection — verifies all session state is cleared and exit stack is replaced after cleanup
  • Added test_reconnect_after_cleanup — verifies a server instance can connect, use tools, cleanup, then reconnect and use tools again
  • Added test_cleanup_closes_write_stream_before_exit_stack — verifies the write stream close happens before exit stack unwind (the critical ordering invariant)
  • All 175 existing MCP tests pass
  • ruff check and ruff format pass

Issue number

Closes #618

Checks

  • I've added new tests (if relevant)
  • I've added/updated the relevant documentation
  • I've run make lint and make format
  • I've made sure tests pass

@github-actions github-actions bot added bug Something isn't working feature:mcp labels Mar 28, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4468bf6f4d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +1002 to +1004
# Reinitialize the exit stack so the same server instance can
# reconnect cleanly after cleanup.
self.exit_stack = AsyncExitStack()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve original exit stack when cleanup is interrupted

Reinitializing self.exit_stack in finally discards pending cleanup callbacks even when await self.exit_stack.aclose() is interrupted (for example by CancelledError, which this method explicitly re-raises). In that case the old stack may still hold open transport/process contexts, but after reassignment there is no way to retry closing them, so subprocesses can be orphaned and the semaphore leak warning can persist during shutdown. Only replacing the stack after a successful aclose() avoids this resource-leak path.

Useful? React with 👍 / 👎.

@seratch
Copy link
Copy Markdown
Member

seratch commented Mar 29, 2026

Thanks for sharing this. I also explored a viable solution for this, but I found that the sufficient fix requires substantial changes across this SDK. So, I don't think the suggested changes here is enough for the issue. See also: #2573 (comment)

@seratch seratch marked this pull request as draft March 29, 2026 00:01
@4444J99
Copy link
Copy Markdown
Author

4444J99 commented Mar 30, 2026

Understood — happy to contribute to the broader fix in #2573 if that's useful.

@github-actions
Copy link
Copy Markdown
Contributor

This PR is stale because it has been open for 10 days with no activity.

@github-actions github-actions bot added the stale label Apr 10, 2026
@4444J99
Copy link
Copy Markdown
Author

4444J99 commented Apr 14, 2026

@seratch — following up on your feedback from March 29. A few developments since then:

  1. Fix #618 MCPServerStdio leaked semaphores warning at shutdown #2573 was closed (March 2) without merging, so the broader solution it represented hasn't landed.
  2. fix(mcp): reset AsyncExitStack after cleanup for reconnect (#618) #2882 was just opened today, addressing the AsyncExitStack reset portion of the reconnect problem — this overlaps with part of what this PR does.
  3. This PR (fix(mcp): prevent leaked semaphore warnings during MCPServerStdio cleanup #2802) addresses the root cause more broadly: it closes the write stream before unwinding the exit stack, giving the subprocess a clean shutdown window for its multiprocessing resources (semaphores, shared memory). The exit-stack reset and reconnect support are also included.

You mentioned the fix here isn't sufficient and that substantial changes are needed across the SDK. Could you share more about what that broader scope looks like? I'd like to understand the architectural direction so I can iterate meaningfully rather than guess.

Happy to coordinate with #2882 if that helps reduce duplicate effort.

4444J99 and others added 2 commits April 14, 2026 12:33
…ent semaphore leaks

When MCPServerStdio.cleanup() tears down via exit_stack.aclose(), the
ClientSession exits first (cancelling its task group), then the transport
exits.  The ClientSession cancellation kills the reader/writer tasks inside
the transport before the transport's finally block can close stdin and wait
for the subprocess to exit gracefully.  This race prevents the subprocess
from flushing multiprocessing resources, causing "leaked semaphore" warnings
from Python's resource_tracker.

Fix: explicitly close the session's write stream before the exit_stack
unwind.  This delivers EOF to the subprocess's stdin, giving it a window to
release its resources and exit cleanly before task groups are cancelled.

Also reinitialise the exit_stack and clear server_initialize_result during
cleanup so the same server instance can reconnect after cleanup.

Closes openai#618
mypy now reports [method-assign] instead of [assignment] for
monkey-patched async methods. Update type: ignore comments to
use the narrower code so `make typecheck` passes cleanly.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@4444J99 4444J99 force-pushed the fix/mcp-stdio-semaphore-leak branch from 4468bf6 to 08c555a Compare April 14, 2026 16:36
@4444J99
Copy link
Copy Markdown
Author

4444J99 commented Apr 14, 2026

@seratch — following up on the "substantial changes" feedback. I've traced the dependency chain:

Root cause is in the MCP library, not the SDK:

  • modelcontextprotocol/python-sdk#1960stdio_client has a race condition during shutdown. The stdout_reader task and cleanup code race when the context exits, causing BrokenResourceError and leaked semaphores.
  • modelcontextprotocol/python-sdk#2268 — an open fix PR addressing this (March 16, no review yet).

What our PR does is bridge the gap at the SDK level: by closing session._write_stream before exit_stack.aclose(), we deliver EOF to the subprocess stdin, giving it a chance to flush resources before task group cancellation. This is admittedly reaching into a private attribute of ClientSession — which is the fragility you flagged.

Two paths forward:

  1. Keep this as a bridge: our fix works now, prevents the semaphore warning for users, and can be removed once mcp >= 1.27 (or whichever version includes Enhance exception handling in MCP server initialization and cleanup #2268) is released. We could add a # TODO: remove once mcp library fixes #1960 comment.

  2. Wait for the library fix: close this PR, and instead bump the mcp dependency lower bound once Enhance exception handling in MCP server initialization and cleanup #2268 lands. No SDK-level workaround needed.

Happy to go either direction based on your preference. If option 1, I can also add a version check to only apply the workaround on mcp < X once the threshold is known.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working feature:mcp stale

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Resource tracker warning (leaked semaphores) with MCPServerStdio

2 participants